Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds correct UTF-8 encoding to Net::BER::BerIdentifiedString #242

Merged
merged 1 commit into from
Dec 15, 2015

Conversation

andibachmann
Copy link

Dear Net-LDAP maintainers

Currently, net-ldap returns values containing umlauts (e.g. 'Müller') with an encoding as 'ASCII-8BIT'.
This is wrong. LDAP in version 3 should encode all data in 'UTF-8' and therefore when
casting returned ''string'' data into a Net::BER::BerIdentifiedString the encoding should be 'UTF-8'.

The current code tries to '#encode('UTF-8')' which results in an
Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8. This error is trapped
and the binary string is returned instead.

If we assume that the data coming from our LDAP/AD-Server is correctly encoded in 'UTF-8', there is
no need to use #encode. The only thing is to set correctly the encoding of the (string) data, i.e. to use force_encoding.

Example:

bin_str = "Müller".b
p [:bin_str, bin_str.encoding, bin_str.bytes, bin_str]
# => [:bin_str, #<Encoding:ASCII-8BIT>, [77, 195, 188, 108, 108, 101, 114], "M\xC3\xBCller"]

# Now try to 'encode' it:
bin_str.encode('UTF-8')
# =>Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8

# Better:
bin_str.force_encoding('UTF-8')
p [:bin_str, bin_str.encoding, bin_str.bytes, bin_str]
# => [:bin_str, #<Encoding:UTF-8>, [77, 195, 188, 108, 108, 101, 114], "Müller"]

I must admit that I have only checked the code (and added one test case) for Umlauts (and characters more or less covered by ISO-8859-1). I have no idea how to test against
Korean, Japanese, Russian, or Chinese encodings.

Nevertheless, I am pretty sure that the current code is bogus for any 'non-ASCII' characters.

Please let me know, if you need further details and I'd be happy to help you out.

regards
andi


assert bis.valid_encoding?, "should be a valid encoding"
assert_equal "UTF-8", bis.encoding.name
end

def test_ut8_data_in_utf8
def test_utf8_data_in_utf8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

@jch
Copy link
Member

jch commented Dec 14, 2015

@andibachmann thank you for taking the time to open this pull request and including helpful comments. I think this should be fine and compatible with the past changes in #212 because we're changing the internal string object. Is there anything else you would like me to review specifically before I merge?

@andibachmann
Copy link
Author

@jch, besides special Encodings like Japanese, Chinese and Korean, I'm pretty sure the code is OK.
thanks for the review!

@satoryu
Copy link
Collaborator

satoryu commented Dec 15, 2015

besides special Encodings like Japanese, Chinese and Korean, I'm pretty sure the code is OK.

Yes, I have a monkey patch as well as your changes does :)

@jch
Copy link
Member

jch commented Dec 15, 2015

Thanks all for chiming in. Merging.

jch added a commit that referenced this pull request Dec 15, 2015
Adds correct UTF-8 encoding to Net::BER::BerIdentifiedString
@jch jch merged commit ea21ef9 into ruby-ldap:master Dec 15, 2015
astratto pushed a commit to astratto/ruby-net-ldap that referenced this pull request Dec 18, 2015
Adds correct UTF-8 encoding to Net::BER::BerIdentifiedString
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Jan 12, 2016
=== Net::LDAP 0.13.0

* Set a connect_timeout for the creation of a socket
  {#243}[ruby-ldap/ruby-net-ldap#243]
* Update bundler before installing gems with bundler
  {#245}[ruby-ldap/ruby-net-ldap#245]
* Net::LDAP#encryption accepts string
  {#239}[ruby-ldap/ruby-net-ldap#239]
* Adds correct UTF-8 encoding to Net::BER::BerIdentifiedString
  {#242}[ruby-ldap/ruby-net-ldap#242]
* Remove 2.3.0-preview since ruby-head already is included
  {#241}[ruby-ldap/ruby-net-ldap#241]
* Drop support for ruby 1.9.3
  {#240}[ruby-ldap/ruby-net-ldap#240]
* Fixed capitalization of StartTLSError
  {#234}[ruby-ldap/ruby-net-ldap#234]
gtanzillo added a commit to gtanzillo/manageiq that referenced this pull request Aug 19, 2016
Version 0.14.0 of the gem (actually, starting with 0.13.0) contains a code change that fixes an encoding error
(Encoding::UndefinedConversionError) that happens when there are extended characters in a dn. The fix forces
utf-8 encoding instead of ASCII-8BIT for objects returned from the directory.

See ruby-ldap/ruby-net-ldap#242
https://bugzilla.redhat.com/show_bug.cgi?id=1367600
This was referenced Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants